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^ (57) Abstract: Methods and apparatus for decoding codewords (902) using message pasang decoding techniques which are partic- 
^ ularly well suited for use with low density parity check (LDPC) codes and long codewords are described. The described methods 
2 aJ'ow decoding graph stnictures which are largely comprised of multiple identical copies of a much smaller graph (1000). Copies 

of the smaller graph are subject to a controlled permutation operation (904) to create the larger graph structure. The same conuolled 
2 permutations are directly implemented to support message passing between the replicated copies of the small graph. Messages cor- 
® responding to individual copies of the graph are stored in a memory and accessed in sets, one from each copy of the graph, using a 
Q SIMD read or write instruction- The graph permutation operation may be implemented by simply reordering messages, e.g., using a 

cyclic permutation operation, in each set of messages react out of a message memory so that the messages are passed lo processing 
^ circuits corresponding to different copies of the snudi graph. 
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METHODS AND APPARATUS FOR DECODING LDPC CODES 

Related Applications 

This application claims the benefit of U.S. Provisional ^plication S.N. 
60/298,480 filed June 15, 2001. 

5 

Field Of the Invention 

The present invention is directed to methods and apparatus for detectiug and/or 
correcting errors in binary data, e.g., through the use of parity check codes such as low density 
10 parity check (LDPC) codes. 

Background 

In the modem mformation age binary values, e.g., ones and zeros, are used to 
15 represent and communicate various types of information, e.g., video, audio, statistical 

information, etc. Unfortunately, during storage, transmission, and/or processing of bmary data, 
errors may be unintentionally introduced, e.g., a one may be changed to a zero or vice versa. 

Generally, in the case of data transmission, a receiver observes each received bit 
20 in the presence of noise or distortion and only an mdication of the bit's value is obtained. Under 
these circiimstances one interprets the observed values as a source of "soft" bits. A soft: bit 
indicates a preferred estimate of the bit's value, i.e., a one or a zero, together with some 
indication of that estimate's reliability. While the number of errors may be relatively low, even 
a small number of errors or level of distortion can result in the data being unusable or, in the 
25 case of transmission errors, may necessitate re-transmission of the data. 



In order to provide a mechanism to check for errors and, in some cases, to correct 
errors, binary data can be coded to introduce carefiilly designed redimdancy. Coding of a unit of 
data produces what is commonly referred to as a codeword. Because of its redundancy, a 
30 codeword will often include more bits than the input unit of data from which the codeword was 
produced. 
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When signals arising from transmitted codewords are received or processed, the 
redundant information included in the codeword as observed in the signal can be used to identify 
and/or correct errors in or remove distortion from the received signal in order to recover the 

5 original data unit. Such error checking and/or correcting can be implemented as part of a 
decoding process. In the absence of errors, or in the case of conrectable errors or distortion, 
decoding can be used to recover from the source data being processed, the original data unit that 
was encoded. In the case of unrecoverable errors, the decoding process may produce some 
indication that the original data cannot be frilly recovered. Such indications of decoding failure 

10 can be used to initiate retransmission of tiie data. 

While data redundancy can iacrease the reliability of the data to be stored or 
transmitted, it comes at the cost of storage space and/or the use of valuable commtinications 
bandwidth. Accordingly, it is desirable to add redundancy in an efficient manner, maxitnizing 
15 the amount of error correction/detection capacity gained for a given amount of redundancy 
introduced into the data. 

With the increased use of fiber optic lines for data communication and increases 
in the rate at which data can be read from and stored to data storage devices, e.g., disk drives, 
20 tapes, etc., there is an increasing need not only for efficient use of data storage and transmission 
capacity but also for the ability to encode and decode data at high rates of speed. 

While encoding efficiency and high data rates are important, for an encoding 
and/or decoding system to be practical for use in a wide range of devices, e.g., consumer 
25 devices, it is important that the encoders and/or decoders be capable of being implemented at 
reasonable cost. Accordingly, the ability to efficiently implement encoding/decoding schemes 
used for error correction and/or detection purposes, e.g., in tenns of hardware costs, can be 
important. 

30 Various types of coding schemes have been used over the years for error 

correction purposes. One class of codes, generally referred to as "turbo codes" were recently 
invented (1993). Turbo codes offer significant benefits over older coding techniques such as 
convolutional codes and have foimd numerous applications. 
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In conjtinction with the advent of turbo codes, there has been increasing interest 
in another class of related, ^parently simpler, codes commonly referred to as low density parity 
check (LDPC) codes. LDPC codes were actually invented by Gallager some 40 years ago 

5 (1961) but have only recently come to the fore. Turbo codes and LDPC codes are coding 
schemes that are used in the context of so-called iterative coding systems, that is, they are 
decoded using iterative decoders. Recently, it has been shown that LDPC codes can provide 
very good error detecting and correcting performance, surpassing or matching that of turbo 
codes for large codewords, e.g., codeword sizes exceeding approximately 1000 bits, given 

10 proper scleciion of LDPC coding parameters. Moreover, LDPC codes can potentially be 
decoded ai much higher speeds than turbo codes. 

In many coding schemes, longer codewords are often more resilient for purposes 
of error detection and correction due to the coding interaction over a larger number of bits. 
15 Thus, the use of long codewords can be beneficial in terms of increasing the ability to detect and 
correct errors. This is particularly true for turbo codes and LDPC codes. Thus, in many 
apphcations the use of long codewords, e.g., codewords exceeding a thousand bits in length, is 
desimble. 

20 The main difficulty encountered in the adoption of LDPC coding and Turbo 

coding in the context of long codewords^ where the use of such codes offers the most promise, is 
the complexity of implementing these coding systems. In a practical sense, complexity 
translates directly into cost of implementation. Both of these coding systems are significantly 
more complex than traditionally used coding systems such as convolutional codes and Reed- 

25 Solomon codes. 

Complexity analysis of signal processing algorithms usually focuses on 
operations counts. When atten5>ting to exploit hardware parallelism in iterative coding systems, 
especially in the case of LDPC codes, significant conq>lexity arises not from computational 
30 requirements but rather fi*om routing requirements. The root of the problem lies in the 
construction of the codes themselves. 
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LDPC codes and turbo codes rely on interleaving messages inside an iterative 
process. In order for the code to perfoim well, the interleaving must have good mixing 
properties. This necessitates the implementation of a complex interleaving process. 

5 LDPC codes are well represented by bipartite graphs, often called Tanner graphs, 

in which one set of nodes, the variable nodes, corresponds to bits of the codeword and the other 
set of nodes, the constraint nodes, sometimes called check nodes, correspond to the set of parity- 
check constraints which define the code. Edges in the graph comiect variable nodes to constraint 
nodes. A variable node and a constraint node are said to be neighbors if they are connected by 
10 an edge in the graph. For simpHcity, we generally assume that a pair of nodes is connected by at 
most one edge. To each variable node is associated one bit of the codeword. In some cases some 
of these bits might be punctured or hiown, as discussed further below. 

A bit sequence associated one-to-one with the variable node sequence is a 
15 codeword of the code if and only if, for each constraint node, the bits nei^boring the constraint 
(via then: association with variable nodes) sum to zero modulo two, i.e., they comprise an even 
number of ones. 

The decoders and decoding algorithms used to decode LDPC codewords operate 
20 by exchangmg messages withm the gr^h along the edges and iqjdating tiiese messages by 

performing computations at tiie nodes b^ed on the incoming messages. Such algorithms will be 
generally referred to as message passing algorithms. Each variable node in the graph is initially 
provided with a soft bit, termed a received value, that indicates an estimate of the associated 
bit's value as detemxined by observations from, e.g., the communications channel. Ideally, the 
25 estimates for separate bits are statistically independent. This ideal can be, and often is, violated 
in practice. Acollectionofreceived values constitutes a recezWworrf. For purposes of this 
application we may identify the signal observed by, e.g., the receiver in a communications 
system with the received word. 

Th® number of edges attached to a node, i.e., a variable node or constraint node, 
is referred to as the degi-ee of the node. A regular graph or code is one for which all variable 
nodes have the same degree, j say, and all constraint nodes have the same degree, k say. In this 
case we say that the code is a (j,k) regular code. These were the codes considered originally by 
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Gallager (1961). In contrast to a "regular" code, an irregular code has constraint nodes and/or 
variable nodes of differing degrees. For example, some variable nodes may be of degree 4, 
others of degree 3 and still others of degree 2. 

5 While irregular codes can be more complicated to represent and/or implement, it 

has been shown that irregular IJDPC codes can provide superior error coirection/detection 
performance when compared to regular LDPC codes. 



In order to more precisely describe the decoding process we introduce the notion 

10 of a socket in describing LDPC graphs. A socket can be viewed as an association of an edge in 
the graph to a node in the graph. Each node has one socket for each edge attached to it and the 
edges are "plugged into" tbe sockets. Thus, a node of degree d has d sockets attached to it If 
the graph has L edges then there are L sockets on the variable node side of the graph, called the 
variable sockets, and L sockets on the constraint node side of the graph, called the constraint 

15 sockets. For identification and ordering purposes, the variable sockets may be enumerated 

1,. . .,L so that all variable sockets attached to one variable node appear contiguously. In such a 
case, if the first three variable nodes have degrees di, da, and dj respectively, then variable 
sockets l,...,di are attached to the first variable node, variable sockets di+1,. di+da are 
attached to the second variable node, and variable sockets di+d2+l,..., di+da+ds are attached to 

20 the third variable node. Constraint node sockets may be enumerated similarly 1 , . . . ,L with all 
constraint sockets attached to one constr^t node appearing contiguously. An edge can be 
viewed as a pairing of sockets, one of each pair coming firom eiach side of the graph. Thus, the 
edges of the graph represent an interleaver or pemiutation on the sockets fcom one side of the 
graph, e.g., the variable node side, to the other, e.g., the constraint node side. The permutations 

25 associated with these systems are often complex, reflecting the complexity of the interleaver as 
indicated above, requiring conq)lex routing of the message passing for their iixq)lementation. 

The notion of message passing algorithms implemented on gr^hs is more 
general than LDPC decoding. The general view is a graph with nodes exchanging messages 
30 along edges in the graph and performing computations based on incoming messages in order to 
produce outgoing messages. 
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An exemplary bipartite graph 100 determining a (3,6) regular LDPC code of 
length ten and rate one-half is shown in Fig. 1 . Length ten indicates that there are ten variable 
nodes Vi-Vio, each identified with one bit of the codeword XrXio (and no pimcturing in this 
case), generally identified by reference numeral 102. Rate one half indicates that there are half 

5 as many check nodes as variable nodes, i.e,, there are five check nodes d-Cs identified by 
reference numeral 106. Rate one half fttrther indicates that the five constraints are linearly 
independent, as discussed below. Each of the lines 104 represents an edge, e.g., a 
communication path or connection, between the check nodes and variable nodes to which the 
line is connected. Each edge identifies two sockets, one variable socket and one constraint 

10 socket. Edges can be enumerated according to their variable sockets or their constraint sockets. 
The variable sockets enumeration corresponds to the edge ordering (top to bottom) as it appears 
on the variable node side at the point where they are connected to the variable nodes. The 
constraint sockets enumeration corresponds to the edge ordering (top to bottom) as it qjpears on 
the constraint node side at the point they are connected to the constraint nodes. During 

15 decoding, messages are passed in both directions along the edges. Thus, as part of the decoding 
process messages are passed along an edge from a constramt node to a variable node and vice 
versa. 

While Fig. 1 illustrates the graph associated with a code of lengQi 10, it can be 
20 qjpreciated that representing the graph for a codeword of length 1000 would be 100 times more 
complicated. 

An alternative to using a graph to represent codes is to use a matrix 
representation such as that shown in Fig. 2. hi the matrix representation of a code, the matrix H 

25 202, commonly referred to as the parity check matrix, includes the relevant edge connection, 
variable node and constraint node information, hi the matrix H, each colunm corresponds to one 
of the variable nodes while each row corresponds to one of the column nodes. Since there are 
10 variable nodes and 5 constraint nodes in the exemplary code, the matrix H mcludes 10 
columns and 5 rows. The entry of the matrix corresponding to a particular variable node and a 

30 particular constraint node is set to 1 if an edge is present in the gr^h, i.e., if the two nodes are 
neighbors, otherwise it is set to 0. For example, since variable node Vi is connected to 
constraint node Ci by an edge, a one is located in the uppermost lefthand comer of tiie matrix 
202. However, variable node V4 is not connected to constraint node Ci so a 0 is positioned in 
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the fourth position of the first row of matrix 202 indicating that the coixespondmg variable and 
constraint nodes are not connected. We say that the constraints are Unearly independent if the 
rows of H are linearly independent vectors over GF[2] (a Galois field of order 2). Enumerating 
edges by sockets, variable or constraint, corresponds to enumerating the I's in H. Variable 
5 socket eniuneration corresponds to enumerating top to bottom within columns and proceeding 
left to right firom column to column, as shown in matrix 208. Constraint socket enumeration 
corresponds to enumerating left to right across rows and proceeding top to bottom fi:om row to 
row, as shown in matrix 210. 

10 In the case of a matrix representation, the codeword X which is to be transmitted 

can be represented as a vector 206 which includes the bits Xi-Xn of the codeword to be 
processed. A bit sequence Xi-Xn is a codeword if and only if ttie product of the matrix 206 and 
202 is equal to zero, that is: Hx=0. 

In the context of discussing codewords associated to LDPC gjcaphs, it should be 
appreciated that in some cases the codeword may be punctured. Puncturing is the act of 
removing bits from a codeword to yield, in eflFect, a shorter codeword. In the case of LDPC 
graphs this means that some of the variable nodes in the graph correspond to bits that are not 
actually transmitted. These variable nodes and the bits associated with them are often referred to 
as state variables. When puncturing is used, the decoder can be used to reconstruct the portion 
of the codeword which is not physically.communicated over a communications channel. Where 
a punctured codeword is transmitted the receiving device may initially populate the missing 
received word values (bits) with ones or zeros assigned, e.g., in an arbitrary fashion, together 
with an indication (soft bit) that these values are completely unreUable, i.e., that these values are 
erased. For purposes of explaining the invention, we shall assume that, when used, these 
receiver-populated values are part of the received word which is to be processed. 

Consider for example the system 350 shown in Fig. 3. The system 350 mcludes 
an encoder 352, a decoder 357 and a communication chamel 356, The encoder 352 includes an 
30 encoding circuit 353 that processes the input data A to produce a codeword X. The codeword X 
includes, for the purposes of error detection and/or correction,' some redundancy* The codeword 
X may be transmitted over the communications channel. Alternatively, the codeword X can be 
divided via a data selection device 354 into first and second portions X, X" respectively by 
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some data selection technique. One of the codeword portions, e.g.. the first portion X*. may then 
be transmitted over the communications chaimel to a receiver including decoder 357 while the 
second portion X" is punctured. As a result of distortions produced by Ihe communications 
channel 356, portions of the transmitted codeword may be lost or corrupted. From the decoder's 
perspective, punctured bits maybe interpreted as lost. 

At the receiver soft bits are inserted into the received word to take the place of 
lost orpunctured bits. The inserted indicating erasure of X" soft bits indicate and/or bits lost in 
transmission. 

The decoder 357 will attempt to reconstruct the fixll codeword X from the 
received word Y and any inserted soft bits, and then perform a data decoding operation to 
produce A from the reconstructed codeword X. 

The decoder 357 includes a channel decoder 358 for reconstructing the complete 
codeword X from the received word Y. In addition it includes a data decoder 359 for removing 
the redundant information included in the codeword to produce the original input data A fix)m 
the reconstracted codeword X. 



It will be ^jpreciated that received words generated in conjunction with LDPC 
coding, can be processed by performing LDPC decoding operations thereon. e.g., error 
correction and detection operations, to geuerate a reconstructed version of the original 
codeword. The reconstructed codeword can then be subject to data decoding to recover the 
original data that was coded. The data decoding process may be, e.g., simply selecting a specific 
25 subset ofthe bits fi-om the reconstructed codeword. 

LDPC decoding operations geneaially comprise message passing algorithms. 
There are many potentiaUy useful message passing algorithms and the use of such algorithms is 
not limited to LDPC decoding. The current invention can be appUed in the context of virtually 
any such message passing algorithm and therefore can be used in various message passing 
systems of which LDPC decoders are but one example. 
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For completeness we will give a brief mathematical description of one realization 
of one of the best known message passing algorithms, known as belief propagation. 

Belief propagation for (binary) LDPC codes can be expressed as follows. 

5 Messages transmitted along the edges of the graph are interpreted as log-likelihoods log— for 

P\ 

the bit associated to the variable node. Here, (po,Pi) represents a conditional probabiUty 
distribution on the associated bit. The soft bits provided to the decoder by the receiver are also 
given m the form of a log-likelihood. Thus, the received values, i.e., the elements of the 
received word, are log-likelihoods of the associated bits conditioned on the observation of flie 
10 bits provided by the communication channel. In general, a message m represents flie log- 
likelihood m and a received value y represents the log-likelihood y. For punctured bits the 
received value y is set to 0, indicating po=Pi~54. 

Let us consider the message-passing rules of belief propagation. Messages are 
15 denoted by m^^^ for messages from check nodes to variable nodes and by m^^^ for messages 
from variable nodes to check nodes. Consider a variable node with d edges. For each edge 
j=l ,...,d let m^^^(i) denote the iocoming message on edge i. At the very beginning of the 
decoding process we set m^^^ =0 for every edge. Then, outgoing messages are given by 

20 x^^%) = y 4. Zt, m^^(i) - m^^^O). 

At the check nodes it is more convenient to represent the messages using tbeir 
'sign' and magnitudes. Thus, for a message m let e GF[2] denote the 'parity of the 

message, i.e., mp = 0 if m ^ and mp = lif m < 0. Additionally let e [0, oo] denote the 

25 magnitude of m. Thus, we have m = -l'"'' . At the check node the updates for mp and mr are 
separate. We have, for a check node of degree d, 

dp 0) (2,=, mp (1)) - mp OX 

30 where all addition is over GF[2], and 
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mr^^O) = ( (K. F( )) - F(m.^^^(j)) ), 

where we define F(x):= log cx)th (x/2). (In both of the above equations the 
superscript V2C denotes the incoming messages at the check node.) We note that F is its own 
inverse, i.e., (x)=F(x). 

Most message passing algorithms can be viewed as approximations to belief 
propagation. It will be appreciated that in any practical digital implementation messages will be 
comprised of a finite number of bits and the message update mles suitably adapted. 

It should be apparent that the complexity associated with representing LDPC 
codes for large codewords is daunting, at least for hardware implementations trymg to exploit 
parallelism. In addition, it can be difficult to implement message passing in a manner that can 
support processing at high speeds. 



In order to make the use of LDPC codes more practical, there is a need for 
methods of representing LDPC codes corresponding to large codewords in an efficient and 
compact manner thereby reducing the amount of information required to represent the code, i.e., 
to describe the associated graph. In addition, there is a need for techniques which will allow the 

20 message passing associated with multiple nodes and multiple edges, e.g., four or more nodes or 
edges, to be performed in parallel in an easily controlled manner, thereby allowing even large 
codewords to be efficiently decoded in a reasonable amount of time. There is further need for a 
decoder architecture that is flexible enough to decode several different LDPC codes. This is 
because many applications require codes of different lengths and rates. Even more desirable is 

25 an architecture that allows the specification of the particular LDPC code to be programmable. 

Brief Description of the Figures ; 

Figure 1 illustrates a bipartite graph representation of an exemplary regular 
30 LDPC code of length ten. 

Figure 2 is a matrix representation of the code graphically illustrated ia Fig. 1. 
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Figure 3 illustrates coding, transmission, and decoding of data. 



Figure 4 is a bipartite graph representation of an exemplary irregular LDPC code. 

5 Figure 5, which comprises the combination of Figs. 5a through 5d, illustrates 

steps performed as part of an LDPC decoding operation in accordance with the LDPC code 
illustrated in Fig. 4. 

Figure 6 is a graphical representation of a smaill LDPC code which is used as the 
10 basis of a much larger LDPC code to present an example in accordance with the present 
invention. 

Figure 7 illustrates the parity check matrix representation of the small LDPC 
code graphically illustrated in Fig. 6. 

15 

Figure 8 illustrates how the edges in the code shown in Fig. 6 can be arranged, 
e.g., enumerated, ia order from the variable node side and how the same edges would appear 
from the constraint node side. 

20 Figure 9 illustrates a system for performing a serial LDPC decoding operation. 

Figure 1 0 graphically illustrates the effect of makuig three copies of the small 
LDPC graph shown in Fig. 6. 

25 Figure 1 1 illustrates the parity check matrix representation of the LDPC graph 

illustrated in Fig. 10. 

Figure 12 illustrates how the edges in the code shown in Fig. 1 1 can be arranged, 
e.g.) enumerated, in order from the variable node side and how the same edges will appear firom 
30 the constraint node side. 
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Figure 13 illustrales the ejBfect of replacing the 3x3 identity matrices shown i 
Fig. 1 1 with cyclic permutation matrices in accordance Avith one exemplary embodiment of the 
present invention. 

Figure 14 iUustrates how the edges in the code shown in Fig. 13 can be 
enumerated in order from the variable node side, and how the same edges wiU appear from the 
constraint node side after being subject to a cycUc pemiutation in accordance with the invention. 

Figure 15 illustrates an LDPC decoder implemented in accordance with the 
present invention fliat vectorizes the decoder of Fig. 9. 

Figures 16 and 17 illustrate other LDPC decoders implemented in accordance 
with the present invention. 

Summary of the invention : 

The present invention is dfrected to methods ^d apparatus for performing 
decodmg operations on words using message passing decoding techniques. The techniques of 
the present invention are particularly weU suited for use with large LDPC codes, e.g., codewords 
of lengths greater than 750 bits, but they can be used for shorter lengths also. The techniques 
and apparatus of the present invention ean also be used for graph design and decodmg where 
other types of message passing algorithms axe used. For purposes of explaining the invention, 
however, exemplary LDPC decodersanddecodingtechniqueswillbe described. 

The techniques of the present invention aUow for decoding of LDPC graphs that 
possess a certain hieraichal structure in which a frill IDPC gr^h appears to be, in large part, 
madeupofmultiplecopies,Zsay,ofaZtimessma]lergr^h. The Z graph copies may be ' 
identical. To be precise we wiU refer to the smaller graph as the pr^yerterfgr^h. Hie technique 
can be best appreciated by first considering a decoder that decodes Z identical smaU LDPC 
graphs synchronously and in parallel. Consider a message passing decoder for a single smaD 
LDPC graph. The decoder implements a sequence of operations corresponding to a message 
passing algorithm. Consider now augmenting the same decoder so that it decodes Z identical 
such LDPC graphs synchronously and in paraUel. Each operation in the message passing 
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algorithm is replicated Z times. Note that the efficiency of the decoding process is improved 
because, in total, decoding proceeds Z times faster and because the control mechanisms required 
to control the message passing process need not be replicated for the Z copies but can rather be 
shared by the Z copies. We can also view the above Z-parailel decoder as a vector decoder. We 
5 can view the process of making Z copies of the smaller graph as vectorizing the smaller 

(projected) graph: Each node of the smaller gr^^h becomes a vector node, comprising Z nodes, 
each edge of the smaller gr^h becomes a vector edge, consisting of Z edges, each message 
exchanged in decoding the smaller graph becomes a vector message^ comprising Z messages. 

10 The present mvention obtains the efiBciencies of the above described 

vectorization while modifying it so that the vector decoder is in fact decoding one large graph, Z 
times larger than the projected graph. This is accomplished by interconnecting the Z copies of 
the projected graph in a controlled manner. Specifically, we allow the Z edges within a vector 
edge to undergo a permutation, or exchange, between copies of the projected graph as they go, 

15 e.g., from the variable node side to the constraint node side. lb the vectorized message passing 
process corresponding to the Z parallel projected graphs this exchange is implemented by 
permuting messages within a vector message as it is passed from one side of the vectorized 
graph to the other. 

20 Consider indexing the projected LDPC gr^hs by 1 j,. . .,Z. hi the strictly parallel 

decoder variable nodes in gmph j are coimected only to constraint nodes in graph j. In 
accordance with the present invention, we take one vector edge, including one corresponding 
edge each from each graph copy, and allow a permutation within the Z edges, e.g., we permit the 
constraint sockets corresponding to the edges within the vector edge to be permuted, e.g., 

25 re-ordered. Henceforth we will often refer to the pennutations, e.g., re-orderings, withm the 
vector edges as rotorioH5. 

Thus, in accordance with flie present invention, a relatively large grqjh can be 
represented, e.g., described, using relatively little memory. For example, a graph may be 
30 represented by storing information describing the projected graph and information describing the 
rotations. Alternatively, the description of the graph may be embodied as a circuit that 
implements a ftmction describing the graph connectivity. 
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Accordingly, the graph representation technique of the present invention 
facilitates paraUel, e.g., vectorized, graph implementations. Fmthennore, the graph 
representation techniques of the present invaition can be used to support decoding of regular or 
irregular graphs, with or without state variables, hiformation describing the degrees of the 
nodes in the projected graph may be stored and provided to a vector node processing element. 
Note that aU nodes belonging to a vector node will have the same degree so degree infomiation 
is required only for one projected graph. 

hi various embodiments, the decoder is made programmable thereby allowing it 
to be programmed with multiple graph descriptions, e.g., as expressed in tenns of stored 
projected graph and stored rotation information or in terms of an implemented function. 
Accordmgly, the decoders of the present invention can be programmed to decode a large number 
of different codes, e.g., both regular and in-egular. hi some particular embodiments the decoder 
is used for a fixed graph or for fixed degrees and this information, hi such embodiments the 
graph description information may be preprogrammed or impUcit. hi such cases the decoder 
may be less flexible than the programmable embodiments but the resources required to support 
programmability are saved. 

hi accordance with one embodiment of the present mvention, a message memory 
is provided which mcludes rows of memory locations, each row con^nding to the messages 
associated with one copy of the projected gr^h. The messages correspondmg to the Z multiple 
projected graphs are stacked to form columns of Z messages per column, such a column 
corresponds to a vector message. This memory arrangement allows the vector messages. e.g., 
set of Z messages, corresponding to vector edge to be read oijt of or written to memory as a miit, 
e.g., usmg a SIMD instruction to access aU the Z messages in a column in one operation. Thus. ' 
memory supports readmg and writing of vector messages as units. Accordingly, the present 
mvention avoids the need to provide a different read/write addi^s for each mdividual message 
in a set of Z messages. 



At one or more points in the message passing processing, after being read out of 
memory, the Z messages are subject to a permutation operation, e.g.. a re-ordering operation. 
The re-ordering operation may be a rotation operation, or rotation for short. These rotation 
operations correspond to the rotations associated to the vector edges which intercomiect the Z 
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copies of the projected graph to fonn the single large graph. This rotation may be appUed, e.g., 
prior to the messages being suppUed to a corresponding vector (constraint or variable) node 
processor. Alternatively the rotation may be applied subsequent to processing by a vector node 
processor. 

5 

The rotation may be implemented using a sinq)le switching device which 
connects, e.g., the message memory to the vector node processing imit and re-ord^ those 
messages as they pass from the memory to the vector node processing unit. In such an 
exemplar}' embodiment, one of the messages in each vector message read from memory is 
10 supplied lo a corresponding one of the Z parallel node processing units, within a vector node 
processor, as dcicnnined by the rotation ^pUed to the vector message by the switchmg device. 
A rotation operation as implemented by the switching device may also or alternatively be 
applied to the vector message prior to its being written into memory and after node processing. 

15 The stored or computed description of the projected graph may include, e.g., 

information on the order in which messages in a row corresponding to a projected graph are to 
be read out of and/or written in to memory during constraint and/or variable node processing. 
The messages of the entire large graph are stored in multiple rows, each row corresponding to a 
different copy of the small graph, the rows being arranged to form columns of messages. Each 

20 column of messages represents a vector message, which can be accessed as a single unit. Thus, 
the information on how to access messages in a row of a projected graph can be used to 
determine the order in which vector messages corresponding to multiple copies of the projected 
graph are accessed m accordance with the present invention. . 

25 The varying of the order in which vector messages are read out and/or written to 

memory according to wheflier the read/write operation corresponds to variable node side or 
constraint node side processing may be described as a first permutation performed on the 
messages. This permutation corresponds to the interleaver as^dated to the projected graph. Li 
order to represent the large decoder gjcdiph from the projected decoder graph, a second set of 

30 pemiutation information, e.g., the rotation information, is stored in addition to vector message 
(e.g., column) access order information. The second permutation information (e.g., the rotation 
infonnation), representing switching control information, indicates how messages in each vector 
message, e.g., column of messages, should be reordered when, e.g., read out of and/or written in 
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to memory. This two-stage pennutation factors the larger permutation describing the complete 
LDPC gr^h into two parts inqjlemented via different mechanisms. 



In one particular embodimait, a cyclic pamutation is used as the second level 
pennutation because of the ease with which such a pennutation can be implemented and the 
compactness of its description. This case motivates the use of the term rotation to describe this 
second level permutation for purposes of explanation. However, it is to be understood that the 
second level pennutation need not be limited to rotations and can be implemented using other 
re-ordering schemes. 



In various embodiments of the present invention, the decoder generates multi-bit 
soft outputs witii one bit, e.g., the sign or parity bit of each soft output, corresponding to a hard 
decision output of the decoder, e.g., the original codeword in the case where all errors have been 
corrected or no errors are present in the received word. The decoder output, e.g., the recovered 
15 codeword, may then be processed ftirther to recover the original data which was used at 
encoding time to produce flie transmitted codeword. 



Li accordance with one feature of the present invention, soft and/or hard outputs 
produced after each complete iteration of variable nodeproc^sing are examined to detennine if 
20 the parity check consbraints indicative of a codeword are satisfied by the current hard decisions. 
This checking process also enjoys the benefits of the graph's two stage fectored permutation 
stiiicture. The iterative decoding process (message passing) may be halted once recovery of a 
codeword is detected in this manner. Accordingly, in the case of relatively aror ftee signals, 
decoding may be completed and detected pronq>tly, e.g., after two or three iterations of the 
message passmg decoding process. However, in flie case of received words that include more 
errors, numerous iterations of the decoding process may occur before decoding is successfiil or 
the process is halted due to a time out constraint 

Prompt detection of successfiil decoding, in accordance with Hbs present 
30 invention, allows for more efficient use of resources as conq>ared to systems that aUocate a fixed 
number of decoding itaations to each received word. 



25 
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Since the decoding techniques of the present invention allow for a large number 
of decoding operations, e.g., constraint and/or variable node decoder processing operations, to 
be performed in parallel, the decoders of the present invention can be used to decode received 
words at high speeds. Furthermore, given the novel technique of the present invention used to 
5 represent large graphs and/or controlling message passing for decoding operations associated 
with such graphs, the difficulties of storing the descriptions of large gr^hs and controUing their 
message routing are reduced and/or overcome. 

Certain generalizations of LDPC codes and decoding techniques of the invention 
10 include coding/decoding over larger alphabets not simply bits, which have two possible values, 
but some larger number of possibilities. Codes where constraint nodes represent constraints 
other than parity check constraints may also be decoded using the mefliods and sqpparatus of the 
present invention. Other relevant generalizations to which the invention can be applied include 
situations where a message passmg algorithm is to be implemented on a graph and one has the 
15 option to design the graph. It will be apparent to those skilled in the art, in view of the present 
patent appUcation, how to apply the techniques of the present invention to these more general 
situations. 

Numerous additional advantages, features and aspects of the decoding techniques 
20 and decoders of the present mvention will be apparent from the detailed description which 
follows. 

Detailed description of the invention : 

25 As discussed above, the decoding methods and apparatus of the present invention 

win be described for purposes of explanation in the context of an LDPC decoder embodiment 
Steps involved in decoding of an LDPC code will first be described with reference to Figs. 4 and 
5 followed by a more detailed discussion of various features of the present invention. 

30 Figure 4 illustrates an exemplary irregular LDPC code using a bipartite graph 

400. The graph includes m check nodes 402, n variable nodes 406, and a plurality of edges 404. 
Messages between the check nodes and variable nodes are exchanged over the edges 404. Soft 
input bits yi throujgh yn. corresponding to the received word Y, and soft (or hard) outputs xi 
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through x„ are indicted by reference numeral 408. The check node is identified using 
reference numeral 402', the n*^ variable node is identified using reference numeral 406' while the 

soft input y„ and the nth soft output x„ axe indicated in Fig. 4 using reference numbers 41 0, 
409 respectively. 

Variable nodes 406 process messages from the constraint nodes 402 together with 
the input soft values from the received word y„. . ..y„ to update the value of the output variables 
X,,. . .,x„ corresponding to the variable nodes and to generate messages to the constraint nodes 
One message is generated by a variable node for each edge comiected to the variable node The 
generated message is transmitted along the edge from the variable node to the constraint node 
attached to the edge. For purposes of explanation, messages from variable nodes to constraint 
nodes will, from time to time in the present application, be indicated by using the abbreviation 
V2C while messages from variable nodes to constraint nodes will be indicated by using the 
abbreviation C2V. Indices may be added to the V and C components of this abbreviation to 
indicate the particular one of the variable nodes and constraint nodes which serves as the 
source/destination of a particular message. Each constraint node 402 is lesponsible for 
processing the messages received from the variable nodes via the edges attached to the particular 
constraint node. The V2C messages received from tiie variable nodes are processed by the 
constraint nodes 402 to generate C2V merges which are fhe« transmitted back along the edges 
attached to each constraint node. The variable nodes 406 th^ process the C2V messages 
together with the soft i,q,ut values, to g^erate and transmit new V2C messages, and gen^ate 
soft outputs, xi. The sequence of perfonning processing at the variable nodes 406 comprising- 

transmitting generated messages to fte check nodes 402, generating at tiievariablenodes s^^ 
outputs Xi. and receiving messages from the check nodes, may be perfonned repeatedly i e 
xteratively, mrtil the outputs from the variable nodes 406 indicate that the codewoxxi has been 
successfully decoded or some other stopping criterion. e.g.. completion of a fixed number of 
message passing iterations, has been satisfied. It should be appreciated that the sequence of 

operations described aboveneed not occur strictly in the order described. Nodeprocessing may 
proceed asjoiohronously and variable and constraint node processing may occur simultaneously 

Nevertheless, flie logic of the iterative process is as described. 

Messages. V2C and C2V. may be one or more bits, e.g., K bits each, where K is a 
positive non-zero mteger value. Similarly the soft outputs x. maybe one or multiple bits. 
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Multiple bit messages and outputs provide the opportunity to relay conjBdence or reliability 
information in the message or output. In the case of a multi-bit, (soft) output, the sign of the soft 
output value may be used to provide the single bit hard output of the decoding process 
corresponding to a variable node, e.g., the bits of the decoded codeword. Output soft values 
5 may correspond to decoded soft values or, altematively, to so-called extrinsic ixiformation 
(excluding the corresponding input information) which may be used in another larger iterative 
process within which the LDPC decoder is but one module. 

The iterative message passing process associated with decoding an LDPC code 
1 0 will now be discussed fiirther with respect to Figs. 5a through 5d. 

When decoding an LDPC code, the processing at each constraint and variable 
node may be performed independently. Accordingly, variable and/or constraint node processing 
may be performed one node at time, e.g., in sequence, until some or all of the variable and 

1 5 constraint node processing has been completed for a particular iteration of the decoding process. 
This allows a single unit of processing hardware to be provided and reused, if desired, to 
perfomi the processing associated with each of the variable and/or constraint nodes. Another 
significant feature of LDPC decoding is that the V2C and C2V messages used during a 
particular processing iteration need not have been generated at the same time, e.g., during the 

20 same processing iteration. This allows for implementations where constraint and variable node 
processing can be performed in paralletwithout regard to when the utilized messages were last 
updated. Following a sufficient number of message updates and iterations wherein all the 
variable and constraint nodes process the received messages and generate i5)dated messages, the 
(hard) output of the variable nodes will converge assuming that the gr^h was properly designed 

25 and there are no remaining uncorrected errors in the received word being processed. 

Given that the processmg at each check node and variable node can be viewed as 
an independent operation, the iterative processing performed at a single exemplary check node 
Cn 502' and variable node Vn 506' will now be discussed m more detail with reference to Figs. 
30 5a-5d. For purposes of description we will flunk of message values and soft input and output 
values as numbers. A positive number corresponds to a hard bit decision of 0 and a negative 
number corresponds to a hard bit decision of 1. Larger magnitudes indicate larger reliability. 
Thus, the number zero indicates total unreliability and the sign (positive or negative) is 
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irrelevant, llris convention is consist^twilh standard pr^^^^^^ 

received and output values) represent log-likelihoods of the associated bits, i.e., soft values take 
thefonn 

probability bit isaO 
probability bit is a I 

where theprobabmiy is conditioned on somerandom variable. e.g.,thephysi^^ 
the bit fiom the communications channel in the case of a received value. 

Fig. 5a illustrates the initial step in an LDPC decodmg process. Initially the 
vanable node V„ 506' is suppHed with the soft input, e.g.. the received values (1 or more bits) y 
fiomarecdvedwordtobeprocessed. n^e C2V messages at the start of a decoding operation 
andthesoftoutputX„ 509 are mitiallysetto zero. Based on the received ir^uts. e.g., the zero 
value C2V messages and input y„. the variable node V„ 506' generates one V2C message for 

each checknode to wMchit is connected. TM>icaUy,inthemitialstc^, each of thesemessages 
will be equal to yn. 

hi Fig. 5b generated V2C messages axe sho>vn bemg transmitted along each of 
the edges connected to variable node V„ 506'. Thus, updated V2C messages are transmitted to 
each of the check nodes 502 coupled to variable node V„ 506- hxcluding check node 502'. 

hi addition to generating the V2C messages, variable node processing results in 

the updatmgofthesoftoutputX„509' corresponding to the variablenode doing theprocessing 
The soft output X„ is shown being updated in Fig. 5c. While shown as different steps, the soft 
output may be output at the same tune the V2C messages aie^ output. 

As will be discussed further below, m accordance with some embodiments of ihe 
pr^entinventior, the soft ou^utsCor their associatedhaxddecidons)m^b^ 

whenacodewordhasbeenrecoveredfromthereceived word. i.e., when the parity constramts 
have been satisfied by tiie output values. This indicates success&l decoding (although the 
codeword found may be incorrect. i.e.. not the one that was transmitted) thereby allowing the 

.teratvedecodingprocess to behaltedinatimelyfasMon.e.g., before someWma^^ 
allowed number of message passmg itei^ons is cornpleted. 



aNSCOCIO: <WO 02103631A1 I > 



wo 02/103631 PCT/US02/17396 

-21- 

Check node processing can be performed once a check node, e.g., check node Cm 
502*, receives V2C messages along the edges to which it is connected. The received V2C 
messages are processed in the check node to generate updated C2V messages, one for each edge 
connected to the particular check node. As a result of check node processing, the C2V message 

5 transmitted back to a variable node along an edge will depend on the value of each of the V2C 
messages received on the other edges connected to the check node but (usuaUy and preferably 
but not necessarily) not upon the V2C message received fiom the particular variable node to 
which the C2V message is being transmitted. Thus, C2V messages are used to transmit 
information generated from messages received from variable nodes other than the node to which 

10 the message is being transmitted. 



Fig. 5d illustrates the passage of updated C2V messages to variable nodes 
including node 506'. hi particular, in Fig. 5d constraint node Cm 502' is shown outputting two 
updated C2V messages with the updated Cn^Yn message being suppHed to variable node Vn 
15 506'. Vn 506* also receives additional updated C2Vn message(s) from another constraint node(s) 
to which it is connected. 



With the receipt of updated C2V messages, variable node processing can be 
repeated to generate updated V2C messages and soft outputs. Then the updating of C2V 
20 messages can be repeated and so on until the decoder stopping criterion is satisfied. 

Thus, the processing shown in Figs. 5a-5d will be repeated after the first iteration, 
using updated message values as opposed to initial values, until the decoding process is stopped/ 

25 The iterative nature of the LDPC decoding process, and the fact that the 

processing at individual nodes can be performed independmt of the processing at other nodes 
provides for a great deal of flexibility when in:q)lementing an IDPC decoder. However, as 
discussed above, the sheer complexity of ttie relationships betwera the edges and the nodes can 
make storage of edge relationship infoimation,.e.g., the graph description, difiScult. Even more 

30 importantly, graph complexity can make message passing difficult to implement in parallel 
implementations where multiple messages are to be passed at &e same time. 
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Practical LDPC decoder implementations often include an edge memory for 
storing messages passed along edges between constraint and/or variable nodes. In addition they 
include a graph descriptor sometimes referred to as a permutation map which includes 
information speci^g edge connections, or socket pairing, thereby defining the decoding graph. 
This permutation map may be implemented as stored data or as circuit which calculates or 
implies the permutation. In addition to the edge memory, one or more node processing units are 
needed to perform the actual processing associated with a node. 

Software LDPC decoder implementations are possible wherein software is used 
to control a CPU to operate as a vector processing unit and to control passing of messages using 
a memory coupled to the CPU. In software implementations, a single memory can also be used 
to store the decoder graph description, edge messages as well as decoder routines used to control 
the CPU. 
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As will be discussed below, in various embodiments of the present invention, 
or more edge memories may be used. In one exemplary multiple edge memory embodiment 
fu^t edge memory is used for the storage and passing of C2V messages and a second edge 
memory is used for the storage and passing of V2C messages. In such embodiments, multiple 
node processing units, e.g., one to perform constraint node ptocessing and another to perform 
variable node processing may, and often are, employed. As will be discussed below, such 
embodiments aUow for variable and constraint processing operations to be performed in paraUel 
with the resulting messages being written into each of the two message memories for use during 
the next iteration of the decoding process. 

We will now present a simple exan^le of a small IDPC gr^h and its 
representation which wiU be used subsequently in explaining the invention. The discussion of 
the LDPC gnxph will be followed by a description of an LDPC decoder which can be used to 
decode the small grapL 

Fig. 6 illustrates a simple irregular LDPC code in the form of a graph 600. The 
code is of length five as indicated by the 5 variable nodes V, through V5 602. Four check nodes 
C, through C4 606 are coupled to the variable nodes 602 by a total of 12 edges 604 over which 
messages may be passed. ~ ' — 



msooRin: <wn n^ioseaiAi i > 



wo 02/103631 PCT/US02/17396 

-23- 

Fig. 7 illustrates, using matrices 702, 704, the IDPC code shown in Fig. 6, in 
parity check matrix form. As discussed above, edges are represented in the peraautation matrix 
H 702 using I's. Bit Xi is associated to variable node Vf. Matrices 706 and 708 show the Ts in 
5 H, corresponding to edges in the graph, indexed according to the variable socket order and 
constraint socket order, respectively. 

For purposes of explanation, the 12 edges will be enumerated from the variable 
node side, i.e., according to their variable sockets. The connections established by the edges 

10 between the variable nodes 602 and check nodes 606 can be seen in Fig. 6. For purposes of 
discussion the edges attached to variable Vi which connects it to check nodes Ci, C2 and C3, are 
assigned labels 1, 2, 3, corresponding to variable socket enum^ation. Variable node V2 is 
connected to check nodes Ci, C3 and C4 by edges 4, 5 and 6, respectively. Variable node V3 is 
coupled to check nodes Ci and C4 by edges 7 and 8, respectively. In addition, variable node V4 

15 is coupled to check nodes C2 and C4 by edges 9 and 10, respectively, while variable node V5 is 
coupled to check nodes C2 and C3 by edges 1 1 and 1 2, respectively. This indexing corresponds 
with matrix 706 of Figure 7, i.e., variable socket order. 

Fig. 8 illustrates the relationship between the 12 edges of Fig. 6, as enumerated 
20 from the variable node side, in relationship to the variable and check nodes to which they are 
connected. Row 802 shows the 5 variable nodes Vi through V5. Beneath the variables 802 are 
shown the edges 1 through 12 804 corresponding to the associated sockets which are connected 
to the particular variable node. Note that since the edges are ordered from the variable node 
side, in row 804 they ^pear in order from 1-12. Let us assume that messages are stored in 
25 memory in tiie order indicated in row 804. 

During variable node processing, the 12 edge messages in memory are accessed 
in sequence, e.g., in the order shown in 804. Thus, during variable node processing, the 
messages may simply be read out in order and supplied to a processing unit. 
30 ^ 

Row 806 illustrates the four constraint nodes CI through C4 present in the code 
of Figs. 6 and 7. Note that the edges are re-ordared in row 804' to reflect the order in which 
they are connected to the constraiut nodes, but the indicated indexing is that induced from the 
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variable node side. Accordingly, assuming that the edge messages are stored in order from the 
variable node side, when perfoiming constraint node processing the messages would be read out 
in the order illustrated in row 804'. That is, during constraint node processing the messages 
would be read out of the memory in the order 1, 4, 7, 2, 9, 1 1, 3. 5., 12, 6, 8, 10. A message 
ordering module can be used to output the correct sequence of edge message access information, 
e.g., memory locations, for reading data from memory or writing data to memory during 
variable and check node processing operations. 

A serial LDPC decoder 900 which performs message processing operations 
sequentially, one edge at a time, wUl now be discussed with regard to Fig. 9 and decoding using 
the exemplary code shown in Fig. 6 will be discussed. The LDPC decoder 900 comprises a 
decoder control module 902, a message ordering module (socket permutation memory) 904, a 
node degree memory 91 0, an edge memory 906, a node processor 908, output buffer 916, hlrd 
decision memory 912 and parity check verifier 914. 

"Hie edge memory 906 includes L K bit memory locations with each K bit 
location corresponding to one edge and where L is the total number of edges in the LDPC graph 
being used and K is the number of bits per message exchanged along an edge. For concreteness 
we assume that the messages are stored in order according to the edge ordering induced by the ' 
variable sockets. Thus, for the example graph 600 the messages corresponding to edges 

^'^ ^^'^^ ^ *e '"^^^ The hard decision memory 912 includes L 1 bit 

memory locations, each 1 bit location corresponding to one edge. Tliis memory stores hard 

decisions trotted by the variablenodes along each of their edges so that the parity check 
constraints may be verified. THe parity check verifier 914 receives the hard bit decisions as the 
check node processor receives messages. The parity checks are verified m the parity check 
verifier and, in the event that all checks are satisfied, transmits a convergence signal to the 
decoder control module 902. 



30 



The message ordering module 904 may be implemented as a pemiutation m^ or 
look-iq, table which includes information describing the ordering of messages in edge memory 
as viewed fix,m the variable node side or as viewed fiom the constraint node side. Thus, for our 
example graph 600 the sequence 1.4,7.2.9.1 1,3,5,12.6,8,10 which specifies edge order al viewed 
from the constraint side would be, effectively, stored in the message ordering module. Ibis 
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sequence is used to order messages for constraint node processing and to order hard decisions 
read out of Hard Decision Memory 912 for processing by the parity check verifier 914. 

In the Fig. 9 decoder, messages corresponding to an edge are overwritten after 
5 they are processed by a node processor. In this manner, the edge memory will alternate between 
storing V2C messages and storing C2V messages. Hard decision verification occurs during 
constraint node processing, e.g., as V2C messages are read out of edge message memory 906. 

The decoder control unit 902 is responsible for toggling the decoder operation 

10 between variable and check node processing modes of operation, for determining when the 

iterative decoding process should be stopped, e.g., because of receipt of a convergence signal or 
reaching a Tnavimum allowed iteration count, for supplying or controlling flie supply of degree 
information to the node processing unit and the parity check verifier, and for controlling the 
supply of an edge index to the Message Ordering Module 904. During operation, the decoder 

15 control module 902 transmits an edge index to the message ordering module 904. The value, 
edge index, is incremented over time to sequence through all the edges in the graph. A different, 
e.g., unique, edge index is used for each edge in a graph being implemented. In response to each 
received edge index, the message ordering module will output an edge identifier, e.g., edge 
memory address information, thus selecting the edge memory location that will be accessed, 

20 e.g., read firom or written to, at any given time. Assuming variable socket ordering, the message 
ordering module 904 will cause messages to be read out and written back in sequential order 
during variable node processing and will cause the messages to be read out and written back in 
order corresponding to constraint socket ordering during constraint node processing- Thus, in 
our above example, the messages will be read out and written back in order 1,2,3,. . .,12 during 

25 variable node processing and, concurrently, hard decisions w^ be written into hard decision 
memory 912 in order 1,2,3,. . .,12. During constraint node processmg the messages will be read 
out and writtmback in order 1,4,7^9,1 1,3,5,12,6,8,10 and, concurrently, the message ordering 
module 904 will cause hard decision bits to be read out of hard decision memory 912 in tiie 
order 1,4,7,2,9,11,3,5,12,6,8,10. 
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As messages are read firom the edge memory in response to the edge identifier 
received fi-om thedecoder control module 902, they are supplied to the node processor 908. The 
node processor 908 performs the appropriate constraint or variable node processing operation, 
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depeading on the mode of operation, thereby using the received messages to generate updated 
messages corresponding to the particular node being implemented at any given time. The 
resulting updated messages are then written back into the edge memory overwriting the 
messages which were just read from the memory. Messages sent to a particular node arrive at 
5 the node processor as a contiguous block, i.e., one after another. The decoder control module 
902 signals node dehneation to the node processor, e.g., by indicating the last message 
corresponding to a node thereby providing node degree information, hi the case of the example 
graph 600, the variable node degrees would be specified, e.g., as the sequence (3,3,2,2,2) and the 
constramt node degrees would be specified, e.g., as the sequence (3,3,3,3). This information 
10 may be stored in node degree memory 910 which would then be read by the decoder control 
module 902 as it iterates over edge indices. Alternatively, the degree information may be 
preprogrammed into each of the node processmg units. This can be preferable, e.g., when it is 
known in advance that the node degrees will be uniform, i.e., the graph will be regular. 

The parity check verifier 914 operates in mudfci the same feshion as a check node 
processor except that incoming messages are smgle bits, no outgoing message is computed, and 
the internal computation is simple-. 



20 



During variable node mode operation, variable node computations will be 
perfonned one node at a time by the node processiug unit until the processing, e.g., message 
updating and soft output value generation operations associated with each of the variable nodes, 
has been completed. Messages are deUvered to the node processor 908 in variable node side 
order so that all messages corresponding to one node arrive m sequence at the node processor 
908. With an iteration of variable node processing completed, the decoder control module 902 
25 causes the decoder 900 to switch into the constraint node mode of processing operation. In 
response to the change in the C/V control signal, the node processmg unit 908 switches from a 
variable node processing mode mto a constraint node processmg mode. In addition the message 
ordering module 904 switches into a mode wherein message identifiers will be supplied to the 
edge memory in the consti:aint socket order. One or more control signals sent over the CA^ 
control line can be used to contixjl the switch betweai constramt and variable node processing 
modes of operation. 
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As the decoder control circuit 902 controls the decoder to perfonn constraint 
node processing in constraint node sequence, one node at a time, the messages stored in the edge 
memory wiU once again be updated, this time by the C2V messages generated by the constraint 
node processing. When the processing associated with the full set of constraint nodes has been 
completed, the decoder control circuit 902 will switch back to the variable node mode of 
processing operation. In this manner, the decoder 900 toggles between variable node and 
constraint node processing. As described, the processing is performed sequentially, one node at 
a time, until the decoder control circuit 902 determines that the decoding operation has been 
completed. 



The scalar or sequential LDPC decoding system illustrated in Fig. 9 can be 
implemented using relatively little hardware. In addition it lends itself well to software 
implementation. Unfortunately, the sequential nature of the processing performed tends to result 
in a relatively slow decoder implementation. Accordingly, while the scalar architecture shown 
15 in Fig. 9 has some noteworthy attributes, it tends to be unsuitable for high bandwidth 

applications such as optical communications or data storage where high decoding speed and the 
use of large codewords is desired. 

Before presenting decoders for decoding large vectorized LDPC graphs, we will 
20 discuss general concepts and techniques relating to graph vectorizing features of the present 
invention. The vectorizing discussion will be followed by a presentation of exemplary 
vectorized LDPC decoders which embody the present invention. 

For purposes of gaining an understanding of vectorizing LDPC graphs consider a 
25 'smair LDPC code with parity check matrix H . The small graph, in the context of a larger 
vectorized gr^h, will be referred to as the projected graph. Let 4^ denote a subset of ZxZ 
permutation matrices. We assume that the inverses of the pennutations in ^ are also in ^ . 
GivOT the small, projected, graph we can form a Z -times larger LDPC graph by replacing each 
element of H with a Z x Z matrix. The 0 elraients of if are replaced with the zero matrix, 
30 denoted 0. The 1 elementsof if are each replaced with a matrix from In this manner we 
'Uft' an LDPC graph to one Z times larger. The conqjlexity of the representation comprises, 
roughly, the number of bits required to specify the permutation matrices, | \ log | T | plus 
the complexity required to represent H , where | \ denotes the number Is in fl" and | T | 
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denotes the number of distinct permutations in T . E.g,, if »F is the space of cycUc 
permutations then | ^ |= Z . In practice we might have, e.g., Z = 16 for n « 1000 . 



H = 



10 1110 0" 
1110 0 10 
1 1 0 1 0 0 1 
0 10 0 111 



H = 



0 (T, (r„ 0 0 

02 (f^ o, 0 0 (r,3 0 

<^3 Cs 0 <r,o 0 0 (r,j 

0 0 0 ff„ ff„ <r„ 



Example: lifting a small parity check matrix, flie a, i = 1,...46 are elements of 
*F shown here indexed in projected variable socket order. 

The subset ^ can in genaal be chosen using various criteria. One of the main 
motivations for the above structure is to simplify hardware in?)lementation of decoders. 
Therefore, it can be beneficial to restrict T to permutations that can be efficiently implemented 
in hardware, e.g., in a switching network. 



Parallel switching network topologies is a well studied subject in comiection with 
multiprocessor architectures and high speed communication switches. One practical example of 
a suitable architecture for the permutation subset T is a class of multi-layer switching networks 
including, e.g., omega (perfect shuffle) / delta networks, log shifter networks, etc. These 
networks offer reasonable implementation complexity and sufBcient richness for the subset »P . 
Additionally multi-layer switching networks scale well e.g., their complexity rises as N log N 
where N is the number of inputs to the network, which makes them especiaUy suitable for 
massively parallel LDPC decoders. Alternatively, in decoders of the present invention with 
20 relatively low levels of paraUeUsm and smaU Z the subset T of permutations can be 
implemented in a single layer. 

An LDPC graph is said to have "multiple edges" if any pair of nodes is connected 
by more than one edge. A multiple edge is the set of edges connecting a pair of nodes that are 

25 connected by more than one edge. Although it is generaUy undesirable for an LDPC graph to 
have multiple edges, in many cases it may be necessary in the construction of vectorized graphs 
that the projected graph possesses multiple edges. One can extend the notion of aparity check 
matrix to allow the matrix entries to denote the number of edges connecting the associated pair 
of nodes. The codeword definition is still the same: the code is the set of 0,1 vectors x satisfying 

30 Hx=0 modulo 2. When vectorizing a projected graph wife multiple edges, in accordance with 
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the invention, each edge within the multiple edge is replaced with a pennutation matrix from T 
and these matrixes are added to yield the extended parity check matrix of the fiill code. Thus, a 
j>l in the parity check matrix H of the projected graph will be 'lifted' to a sum cTk + ^k+i +• . --^ 
0\i+y\> of permutation matrixes from ^ . Usually, one will choose the elements of the sum so 
5 that each entry of ajc + aicfi +. c^k+j-i is either 0 or 1, i.e., the fiill graph has no multiple edges. 

The above described lifting jqjpears to have one limitation. Under the above 
construction both the code length and the length of the encoded data unit must be multiples of Z. 
This apparent limitation is easily overcome, however. Suppose the data unit to be encoded has 

10 length A Z f B where A is a positive integer and B is between 1 and Z inclusive, and the desfred 
code length is C Z + D where C is a positive integer and D is between 0 and Z-1 inclusive. Let 
E be the smallest positive integer such thatEZ>=CZH-D + (Z-B). One can design a lifted 
graph which encodes a (A+1 )Z length data unit to produce a codeword of length E Z such that 
the data unit appears as part of the codeword, and use this to produce the desked code 

15 parameters as follows. Given a data imit of length AZ + B one concatenates Z-B zeros to • 

produce a data unit of length (A+1)Z. That data unit is encoded to produce a codeword of length 
EZ. The Z-B zeros are not transmitted. Out of the other EZ - (Z-B) bits in the codeword one 
selects EZ - CZ - D - (Z-B) bits and punctures them, note that the number of puncture bits is 
between 0 and Z-1 inclusive. These bits will not be transmitted, so the actual number of 

20 transmitted bits is EZ - (Z - B) ~ (EZ - CZ - D - (Z-B)) = CZ + D, which is the desired code 
length. The receiver, knowing in advan<:e about the additional zeros and punctured bits 
substitutes soft bits for the punctured bits indicating erasure, and substitutes soft bits for the 
known zero bits indicating a zero value with largest possible reliability. The extended received^ 
word, of length EZ may now be decoded to recover the original data unit. In practice one 

25 usually makes these adjustments by puncturing bits from only one vector node and declaring 
known bits from only one vector node. 

Various decoder implications which use the above discussed technique of 
vectorizing LDPC graphs will now be addressed. 

30 

As discussed above, message-passing decoding of LDPC codes involves passing 
messages along the edges of the graph representing the code and performing computations based 
on those messages at the nodes of the graph, e.g., the variable and constraint nodes. 
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Givea a vectorized UDPC graphs one can vectorize the decoding process as 
foUows. The decoder operates as if it were decoding Z copies of the projected IDPC code 
synchronously and in parallel. Control of the decoding process corresponds to the projected 
LDPC graph and may be shared across the Z copies. Thus, we describe the decoder as operating 

on vector messages traversing vector edges and being received by vector nodes, each vector 
havmgZ elements. Sockets also become vectorized, m particular a vector node processor 
nught compnse Z node processors in parallel and, when a vector of messages, (m, m^ is 
dehvered to thevectornodeprocessormessagem, is delivered to the i%rocessor! Thus no 

routmgorreorderingofmessages occurs withinavectornodeprocessor.i.e.. the vectormessage 
IS aligned with the vector of processors in a fixed way. 

One deviation from purely disjoint parallel execution of the Z projected gr^hs is 

lhatmessagesarere-orderedwithinavectormessageduringthemessagepassingprocess We 
15 refertotbsre-orderingoperationasar././z-.«. The rotation implements the permutation 

operations defined byT.B«.auseofthe rotations, theprocessingpathsoftheZcopiesofthe 
projected graphnnx thereby linldng them to fomaasinglelar^e graph. Control^^^ 
which specifies the rotations is needed in addition to the control infomration required for the 
projected graph. Fortunately, the rotation control information can be specified using relatively 

20 little memoiy. 

While various p«mu,atio„s «m be »«d for fte rotaSom in accori^e vrifl. tte fr^t 
mvention. to ^ „f cyolie p«,„«™ prtcularly i««sti»g beeause of the ease »ith 
which s.chpemmtafio,«caDb.in,pl«^.d. For ^.pUdft, we will now assume that V 
35 compr,s.steg™„p„foj«Hopem„«iona. In«ns case. onriargeUJPC graphs a«c«a,xained 
.oWa,nasiK,)«lic«n««n,..Porpaq,«,,ofttis«^,^,^„^^^^^,^^_^ 

nod«n,tog,,handle, U h. to number ofoonsttain. nodes in to gr^h. Firs, we assume 
aatbo* i^and i^aremnWplesof Z. ;,=„Zandi./ = ^^e,e Z WU,^„^^,„^„„, 
the cycle. 
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Letusviewnodesaredonblyindexed. Thus, variable node v,, is the j%ariable 
nodeftomthei^copyoftheprojectedgraph. Since ^ is the group of cyclic permutations 
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variable node v^ j is connected to a constraint node c^^ if and only if variable node Vi+jtrMdzj 
connected to a constraint node c^^kmodz/, ^ ^ ^ • 

The techniques of the present invention for representing a large graph using a much 
5 smaller graph representation and rotation information will now be explained further in reference to 
Figs. 10 through 16 which relate to vectorization of the graph 600. The techniques described with 
reference to these figures can be apphed to much larger LDPC graphs. 

In accordance with the present invention, a larger graph can be generated by 
10 replicating, i.e., implementing multiple copies, of the small graph shown in Fig. 6 and then 

performing rotation operations to interconnect the various copies of the repUcated graph. We refer 
10 the small graph within the larger graph stmcture as the projected graph. 

Fig. 10 is a graph 1000 illustrating the result of making 3 parallel copies of the 
15 small graph illustrated in Fig. 6. Variable nodes 602*, 602" and 602"' correspond to the first 
through third graphs, respectively, resulting from making three copies of the Fig. 6 graph. In 
addition, check nodes 606', 606" and 606'" correspond to the first through third graphs, 
respectively, resulting from making the three copies. Note that there are no edges connecting 
nodes of one of the three graphs to nodes of another one of the three graphs. Accordingly, tiiis 
20 copying process, which "lifts" the basic graph by a factor of 3, results in three disjoint identical 
graphs. 

Fig. 1 1 illustrates the result of the copying process discussed above using 
matrices 1 102 and 1 104. Note that to make three copies of the original graph each non-zero 
25 element in the matrix 702 is replaced with a 3x3 identity matrix. Thus, each one in the matrix 
702 is replaced with a 3x3 matrix having Ts along the diagonal and 0*s everywhere else to 
produce the matrix 1 102. Note that matrix 1 102 has 3 times the number of edges that matrix 
702 had, 12 edges for each one of the 3 copies of the basic graph shown in Fig. 6. Here, variable 
Xjj corresponds to variable node Vjj. 

30 
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Fig. 12 Shows the relationship between the (3x12) 36 edges, the (3x5) 15 variable 
nodes, and the (3x4) 12 constraint nodes which makeup graph 1000. As in the case of Fig. 8. 
edges are enumerated from the variable node side. 



30 



For purposes of annotation, the first number used to identify anode, constraint 
edge mdicates the graph copy to which the edge belongs, e.g., the first, second or third graph ' 
copy. The second number is used to identify the element nmnber within the particular specified 
copy of the basic graph. 



For example, in row 1202' the value (1,2) is used to indicate edge 2 of the first 
copy of the gr^h while in row 1202" (2,2) is used to indicate edge 2 of the second copy of the 
graph. 



Note that edge rows 1202', 1202", 1202'" are simpfy copies of row 804 
representing three copies of the row of edges 804, shown in Fig. 8, as they relate to the variable 
nodes. Smnlarfy edge rows 1204', 1204" and 1204"' represent three copies ofthe row of edges 
804' shown in fig. 8 as they relate to the constraint nodes. 

Let us briefly discuss how to modify the Fig. 9 decoder 900 to decode the Z=3 
paraUel graphs now defined. Tte node processor 908 will be made a vector node processor able 
toprocess3 identicalnodessimultanetfuslyinparallel. AU outputs from the node processor 908 
will be vectorized, thereby carrying 3 times the data previously carried. Hard decision memory 
912 and edge message memory 906 wiU be made 3 times wider, each capable of writmg or 
readmg 3 units (bits or Kbit messages respectively) in parallel using atthe direction of a single 
SIMD mstruction. Outputs from these memories wiU now be vector^ 3 times wider than before 

•nieparifycheckverifier914andtheoutputbufrer916^alsobesuitabfyvectori2ed with all 
processing suitably paxalleKzed. 



or 



Let us now consider the introduction of rotations into our example. This can be 
iUustratedbyrq,lacing each ofthe 3x3 identitymatrixes shown in Fig. 11 with3x3 cychc 
pemmtation matrixes as shown inFig. 13. Note that there are three possibihties for the cychc 
permutationmatrixusedinFig. 13. » is possible tx, indicate the particular pemiutation matrix to 
be substituted for an identify matrix by indicatmg whether the pemiutation matrix has a " 1 " 
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located in the first, second or third position in the first row of the permutation matrix. For 
example, in the case of matrix 1302, beginning at the top left and proceeding to the bottom right 
comer (vector constraint socket order) the rotations could be specified by the sequence (2, 2, 3, 
3, 1,1,1,3,2,1,2,3). 

5 

Fig. 14 illustrates the effect of perfoiming the cyclic permutation (rotation) on the 
constraint node side. Since the permutation is performed firom the constraint node side, the 
relationship between the edges, e.g., ordering, firom the variable node side remains unchanged as 
shown in rows 1402', 1402" and 1402"*. From the constraint side, however, the permutation 

10 results m edges within a column, e.g., the edges within a specific vector edge, being reordered as 
shown in rows 1404', 1404", 1404"'. This produces interconnections between nodes 
corresponding to different copies of the projected graph- 
Consider, for example, column 1 of rows 1404 in relationship to column 1 of 

15 rows 11 04 of Fig. 11. Note that as a result of the vector edge permutation, operation, constraint 
node Ci,i is now cormected to edge (2,1) as opposed to edge (1,1), constraint node C2,i is coupled 
to edge (3,1) as opposed to edge (2,1) and constraint node Cjj is coupled to edge (1,1) as 
opposed to edge (3,1). 

20 We discussed above how to vectorize decoder 900 to decode Z parallel copies of 

the projected graph. By introducing switches into the message paths to perform rotations, we 
decode the UDPC code defined in Fig. 13. 

Figure 15 illustrates a decoder incorporating various features of the present 
25 invention. Thedecoder 1500 fiilly vectorizes, with rotations, the decoder 600. Note that the 
figure indicates Z=4 whereas our example has Z=3, in genoral we may have any 2>1 but in 
practice Z values of the form 2*^ for integer k are often preferable. Similarities wifli decoder 600 
are apparent. In particular the decoder control module 1502 and the node degree meinory 1510 
ftinction in the same or a similar manner as their respective coxmterparts 902 and 910 in decoder 
30 900. For example, to decode LDPC code defimed in Figs. 13 and 14 the operation of fliese 

components would be exactly the same as their counterparts in decoder 900 when decoding the 
example graph 600. The edge message memory 1506 and hard decision memory 1512 are 
vectorized versions of their counterparts 906 and 912 in decoder 900. Whereas in decoder 900 
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the memories stored single units (K bit messages or bits) the coiresponding memories in decoder 
1500 store sets, i.e., vectors, messages, resulting in e.g., Z x K bit messages being stored These 
vectors are written or read as single units using SIMD instructions. Thus the message identifiers 
sent to these modules firom the message ordering module 1504 are equivalent or similar to those 
in decoder 900. The message ordering module 1504 has the additional role, beyond what its 
counteipart 904 had in decoder 900, of storing and providing tiie permutation, e.g., rotation, 
information. Recall that in decoding example 600 decoder 900 stored in its message ordering 
module 904 the edge sequence (1,4,7,2,9,1 1,3.5,12,6,8,10). Consider using decoder 1500. to 
decode the code of Fig. 13 and 14. The message ordering module 1504 would store the same 
above sequence for accessing message vectors during constraint node processing, and also store 
the sequence (2, 2, 3, 3, 1, 1, 1, 3, 2, 1, 2, 3) which describes the rotations associated to the same 
sequence of vector messages. This sequence serves as the basis to generate the rot signal which 
is used by the message ordering module 1504 to cause the switches 1520 and 1522 to rotate 
vector messages and vector hard decision bits respectively. (Note that the hard decision bits are 
provided only during variable node processing mode.) The vector parity check verifier 1514 is a 
vector version of its counterpart 914 in decoder 900. Note that the output convergence signal is 
a scalar, as before. The output buffer 1516 serves the same purpose as buffer 916. but output 
data is written as vectors. The vector node processor 1508. is, e.g., Z node processors, each as in 
908, in parallel. These nodes would share the deg signals and CA^ control signal from the 
20 decoder control module 1502. 



10 



15 



la order to facilitate the ability to output either soft or hard decoder decisions the 
soft decisions generated by the variable processing unit are suppUed to a soft decision input of 
buffer 1516. Thus, at any time prior to completion of decoding, soft decisions may be obtained 
25 form the ou^ut of buffer 1516. 

Consider further how decoder 1500 would fimction decoding the example of Figs. 13 and 14. 
Initially the edge message memoiy 1506 is populated with Os. The decoder control module 
1 502 first toggles into variable node processing mode. The edge message memoiy 1 506 vectors 
30 (all Os at this point) are read out in order and deUvered to the vector node processor 1508 for 
variable node processing. The vector node processor 1508 then outputs the received values 
alone along each edge firom a variable node, we will use y to denote these first messages to 
indicate this. Thus, the outgoing vectors would be (y,,i, yy . yj.;) for i=l,. . .,12 in increasing 
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order. The rot signal is used to control message re-ordering performed by switching circuits 
1520, 1522. The rot signal from the message ordering module 1504 will cause the messages in 
the vectors to be rotated to produce processed vectors as follows: (y2.by3,byi,iX (y3;t,yi;2>y2;2). 

(yu,y23,y33). (y2.4,y3,4,yu4). (y3^,yi^,y2^X (yi,6,y2,6,y3,6). (y3,7,yi,7,y2,7), (y2,8,y3,8,yi,8). 

5 (yi,9,y24>,y34»). (y3.io,yi,io,y2.ioX (yi,iby2,ii,y3,ii). (y2.i2,y3,i2,yi,i2). Once the processed vectors are 
written into edge memory 1 506, in the indicated order, the decoder control module 1502 will 
toggle into constraint mode. The stored vector messages will then be read out in order 
(1,4,7,2,9,1 1,3,5,12,6,8,10). Thus, they will be presented to the vector node processor 1508 in 
the order (y2.i,y3.i ,yi.i). (y2,4,y3,4,yi,4), (y3.7,yi,7,y2.7), (y3.2,yi^,y2;j). (yi^,y24>,y3.9). (yi.i i,y2,i i,y3.! 1), 
10 (yu.y23.y3^). (y3^5yi.5,y2^). (y2,i2,y3,i2,yi,i2), (yi.6,y2.6,y3.6).(y2,8,y3.8,yi.8), (y3,io,yuio,y2,io). The 

vector node processor 1508 is implemented as three (Z=3) node processors in parallel. The 1 ^* 
element (message) of each message vector (set of messages) is delivered to the 1^^ node 
processor; the 2"^ message is delivered to the 2"** processor; and the 3"^ message is delivered to 
the 3^*^ processor, respectively. The deg signal, which indicates the degree of the current node 
15 being processed, is suppUed by the degree memory 1 5 1 0 to the three parallel processors of 
vector node processor 1508. At this point the deg signal indicates that constraints are (all) 
degree 3 so the 1^^ processor would process y2,i, y2,4, and ya,? for its first constraint node and y3,2, 
yi,9, and yi,n for its second. Similarly, the 2"^ processor would process ysj, y^A, and y\j for its 
first constraint node and yi^, y2,9, and y2,n for its second. 

20 

Let mij denote the outgoing message corresponding to the incoming yij. As the 
vectors are emerging firom the vector node processor 1508, the rot signal to switch 1 520 will 
cause the vectors to be reordered so that the previous rotation is reversed, hence they arrive at 
the edge memory as (mi j,m2j,m3 j), in the order j=l ,4,7,2,9,1 1,3,5,12,6,8,10. The messages are 

25 written back into memory according to the message identifier order with which they were read, 
so after writing they £^pear in memory as (mi j,m2j,m3j) in order j=l,. . .,12. The message 
ordering module 1504 now toggles into variable node processing mode m response to a CA^ 
signal supplied by decoder control module 1502. The message vectors are then read out in order 
j=l,, . .,12 and delivered to the vector node processor 1 508 for variable node processing. This 

30 coinpletes an iteration. 

During variable node processing the vector node processor 1508 also outputs soft decoded 
vectors which are stored in the output buffer 1516. It also outputs hard decisions which are 
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snpplied to switching circuit 1522. The vectors of one bit hard decisions undwgo the same 
rotation operation as the message vectors at the corresponding times. The rotated hard decision 
vectors produced by switching circuit 1522 are then arrayed in the hard decision memory 1512 
where they are stored. As a result of applying the same rotation applied to the message vectors, 
5 the hard decisions may be read out in flie same order as the vector messages are read out during 
constramt node processing. During constraint node processing the hard decisions are delivered 
to tbe vector parity check verifier 1514 which performs Z parity checks in parallel. If all parity 
checks are satisfied then the convergence signal, CON is generated and emitted. In response to 
receiving the convergence signal indicating successful decoding, the decoder control module 
10 1502 stops the decoding process. 

It should be apparent that there are many variations to decoder 1500 that each 
embody the current invention. For example, the sAvitch 1 520 could have instead been placed 
along the data path between the edge message memory 1506 and the vector node processor 

15 1508. Similarly, switch 1 522 could have instead been placed along the data path between ttie 
hard decision memory 1512 and the vector parity check verifier 1514. Such a replacement 
would also involve appropriate adjustment of the timing of rot signal. The hard decision 
memory 1512, the vector parity check verifier 1514 and the attendant data paflis need not be 
used and are eliminated for decoder embodiments that perform a fixed number of iterations and 

20 therefore do not require convergence detection. Many further variations will be ^parent to 
those skilled in the art in view of the present invention. 

Fig. 16 illustrates a decoder 1600 which is implemented in accordance with 
another embodiment of the present inveation. The decoder 1 600 mcludes many elements which 

25 are the same as, or sinrilar to, the elements of the decoder 1500. Accordingly, for the purposes 
of brevity, such elements will be identified using the same reference numbers as used in Fig. 15 
and will not be discussed again in detail. The decoder 1600 is capable of performing both 
variable node and constraint node processmg operations, e.g., message updating operations, at 
the same time, e.g., simultaneously and independentty. In conti-ast to the Fig. 15 decoder 

30 iniq)lementation which may be described as a side-to-side decoder because of the way it toggles 
between variable node and constraint node processing iterations, the decoder 1600 can be 
described as an asynchronous iteration decoder smce tiie variable node and constraint node 
processing operations can be performed independently, e.g., simultaneously. 
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The decoder circuit 1600 includes a decoder control module 1602, a message 
ordering module 1604, a first switching circuit 1621, V2C edge message memory 1606, a 
constraint node vector processor (e.g., Z-constraint node processors in parallel) 1609, a second 
5 switching circuit 1620, CIV edge message memory 1607, a variable node vector processor 1 608 
(e.g., Z-variable node processors in parallel), a hard decision memory 1612, and a fliird switch 
1622 coupled together as illustrated in Fig. 16. 

Various embodiments of individual constraint node processors and iudividual 
10 variable node processors, Z of which can be used in parallel to implement the constraint node 
vector processor 1609 and variable node processor 1608, respectively, are described in detail in 
U.S. Provisional Patent Application 60/328,469, titled "Node Processors For Use in Parity 
Check Decoders", which was filed on October 10, 2001, and which is hereby expressly 
incorporated by reference. The inventors of the present patent application are also the named 
15 inventors on the incorporated provisional patent application. 

In order to support independent and/or parallel updating of constraint and 
variable messages in the Fig. 16 embodiment separate edge message memories 1606, 1607 and 
switching circuits 1620, 1621 are used to support constraint node and variable node processing 
20 operations, respectively. As in the Fig. 15 embodiment, each of the message memories 1606, 
1607 are capable of storing L (ZxK-bit> vector messages. Each vector message, e.g., coliimn of 
Z K-bit messages, in the memories 1606, 1607, can be read firom or written to in a single read or 
write operation. 

25 V2C edge message memory 1 606 is used to store V2C messages and therefore 

has a write ii^ut coupled to the output of the switching circuit 1621 which receives data firom 
variable node vector processor 1608. The C2V message memory 1607 is used to store C2V 
edge messages and therefore has a write input coupled to the output of the constramt node vector 
processor 1609. 



30 



Switches 1620 and 1621 are used to couple the variable node vector processor 
1608 to tiie input of V2C edge message memory and the output of the C2V edge message 
memory, respectively. In one particular embodiment message vectors are stored in vector 
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constraint socket order. Message vectors are written into C2V edge message memory 1 607 and 
read out of V2C edge message memory 1606 in vector constraint socket order, i.e., linearly, thus 
no external control is required (edge index ou^ut fixjm decoder control module 1602 passes 
through message ordering module 1604 unchanged to it's constraint edge index output). 
Message vectors are read out of C2V edge message memory 1 607 and written into V2C edge 
message memory 1606 in vector variable socket order. The message ordering module 1604 
generates the variable edge index signal which indicates this ordering. Note that this signal 
controls reading of C2V edge message memory 1607 and is deUvered to V2C edge message 
memory 1606 after being delayed. The delay accounts for the time required for processing 
perfonned by switches 1620 and 1621 and the vector variable node processor 1608. This delay 
may be a function of the degree of the node being processed, as indicated in Fig. 16 by the 
variable node degree signal. 

To avoid processing pipeline stalls due to variable delay both constraint and 
variable nodes are ordered such that nodes of the same degree are processed in a contiguous 
fashion. Further reduction in pipeline stalls occurring on the boundary of node groups with 
different degrees can be achieved by sorting node groups by degree in a monotonic fashion e.g., 
increasing or decreasing degree order. For implementation simplicity embodiments 900, 1500 ' 
and 1600 assume increasing degree order. 

In the particular embodiment illustrated in Fig. 16 vectors are stored in vector 
constraint node rotation order. Switch 1620 rotates the messages in each vector into variable 
rotation as each C2V vector message proceeds to variable node vector processor 1608 and then 
switch 1 62 1 appUes the inverse rotation to the outgoing V2C vector message corresponding to 
the same vector edge. Hie lot signal deUvered to switch 1620 is delivered to switch 1621 via 
rotation inversion circuit 1624 after a delay matched to the processing time in the vector 
constraint node processor. This delay may depend on the constraint node degree, as indicated by 
the constraint node degree signal ou^ut by degree memory 1610. 

The decoder 1600 includes decoder control module 1602. The decoder control 
module operates in a similar manner to the previously discussed control module 1502. 
However, in 1602 no CA^ control signal is generated. The edge index generation fimction 
be provided by a counter which cycles through the entire set of vector edges before starting 



can 
over. 
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In addition to outputting the soft decisions, hard decisions are generated, one per 
edge, by each of Z-variabie node processing units in the vector variable node processor 1608 
each time a vector V2C message is generated. While the vector messages are written into the 
5 V2C edge message memory 1606, the Zxl bit hard decision outputs are written into the hard 
decision output memory 1612 after being rotated by switch 1622. Switch 1622 and hard 
decision memory 1612 operate under the same control signals as switch 1621 and V2C edge 
message memory 1606, respectively. 

10 The resulting Zxl rotated vectors are delivered to the vector parity check verifier 

1614 which includes Z parity check verifiers connected in parallel. The verifier 1614 
determines if the parity checks are satisfied and if all are satisfied then a convergence signal is 
generated and sent to the decoder control module, hi response to receiving a signal indicating 
convergence, the decoder control module stops decoding of the received codeword. In the 

15 embodiment 1600 the convergence detection signal is available one iteration after a codeword 
has been written to the output buflFer since constraint verification for data from iteration N is 
done dining iteration N+1 and the convergence signal is available upon completion of iteration 
N+l. 

20 In the decoder 1600 but which employs different convergence detection circuitry 

Fig. 17 illustrates an embodiment similar to the decoder 1700 constraint verification is 
accomplished "on the fly" as the ZxK-bit output values X are written to the output buffer 1716. 
In this case memory block 1712 keeps track of constraints status as vector constraints are 
updated by hard decision output fix>m variable node processor 1708. Each constraint status 

25 memory location corresponds to a codeword parity check constraint. On the last update of any 
constraint status location these parity check values are verified. If all verifications during 
iteration N are satisfied tiien a convergence signal will be generated and output by the vector 
parity check verifier 1714 unmediately after iteration N, This will qualify to the decoder control 
module 1702 that data in output buffer 1716 is vaUd. In the Fig. 17 embodiment the message 

30 ordering module 1704 generates an additional signal defining constraint node index (as opposed 
to edge index) which is not generated m the Fig. 16 embodiment. The constraint node index 
identifies the constraint node destination of the current V2C message. This field serves as an 
index to constraint status memory 1712 to which it is supplied. 



NSCXDCia <WO„0ai03631AlJ_> 



wo 02/103631 

PCTAIS02/17396 

-40- 



10 



15 



20 



25 



WMlerequiringaUttlemorecircuitiythantheFig. 15 embodiment, the Fig 16 
and Fig. 1 7 embodiments have the advantage of more efficient use of the vector constraint and 

vanablenodeprocessors 1609/1709. 1608/1708 smcebothvector node processors are utilized 
folly duHBg each processing iteration. Jn addition, decoding time is reduced as compared to the 

Fig. 15 embodiment since conslraintand variable nodeprocessingisperfomiedin parallel eg 
simultaneously. ' ' 

The above described decoding methods allow for message passing decoding e g 
IDPC decoding, to be performed using software and general purpose computers capable of 
s^porting SIMD operations, m such embodiments, one or more paxaUel processors serve as 

vectorprocessing units or.hardwarewithinasingleprocessor may be used toperformmultiple 
vector processing operations in parallel, hi such embodiments, the edge memory, pemiutation 

map and mfonnation on thenumberofmessages per node may aU be stored inacommon 
memory, e.g., the computers main memory. The message passing control logic and decoder 
control logic may be implemented as software routines executed on the computer's processing 
unit. M addition, the switching device may be implemented using software and one or more 
SMD processmg instructions. 

The above described LDPC decoding methods allow for LDPC decoding to 

performed on varioushardwareplatfomis such as FieldProgrammable Gate Arrays or in an 
Application Specific fategrated Circuit. Represent invention is especially useful in these 
settmgs where the simple parallelism can be explicitly exploited. 

Nmnerous additional variations on the decoding methods and apparatus of the 

present inventionwillbe^arent to thosesMUedin the art in view ofthe above description 
the mvention. Such variations are to be considered within flie scope of the invention 
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What is claimed is ; 

1 1 . An apparatus for performing message passing decoding operations, the apparatus 

2 comprising: 

3 memory including a set of memory locations for storing L sets of Z K-bit 

4 messages, where Z is a positive integer greater than one and K and L are non-zero positive 

5 integers; 

6 a node processor including a plurality of node processing units, each node 

7 processing unit for performing at least one of a constraint node processing operation and a 

8 variable node processing operation; and 

9 a switching device coupled to the memory and to the node processing unit, the 

10 switching device for passing sets of Z K-bit messages between said memory and said node 

1 1 processor and Tor reordering the messages in at least one of said sets of messages in response to 

12 switch control information. 

1 2. The apparatus of claim 1, further comprising: 

2 a message ordering control module coupled to said switching device for 

3 generating said switch control information used to control the reordering of messages in said at 

4 least one set of messages. 

1 3 . The apparatus of claim 2, wherein the switching device includes circuitry for performing 

2 a message rotation operation to reorder messages included in a set of messages. 

1 4. The apparatus of claim 2, wherein the message ordering control module stores 

2 information on the order sets of messages are to be read out of the memory and information 

3 indicating what reordering of messages is to be performed by said switch on mdividual sets of 

4 messages read out of the memory. 

1 5 . The apparatus of claim 2, wherein flie message ordering control module is further 

2 coiq)led to said memory and sequentially generates set identifiers, each set identifier controlling 

3 the memory to access memory locations corresponding to a set of messages as part of a single 

4 read or write operation. 
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1 6. The appamtus of claim 5, wh^ein each set identifier is a single memoiy addr^s. 

1 7. ll^e^paratusofclaim2,whereinsaidpluraUtyofnodeprocessingunitsincludesZnode 

2 P^°«^«i°g^tsaxxangedinparaUel.eachoneoftheZnodeprocessingumtsoperatin^ 

3 parallel to process a different message in each set of Z messages passed between said memory 

4 and said node processor. 



2 
3 

1 
2 
3 



1 8. The apparatus of claim 7. wherein said memory includes an address input which allows 



3 



each set of messages to be addressed as a unit thereby enabling a set of messages to be read from 
said memoiy m a single SIMD read operation. 

9. Ite apparatus of claim 7, wherein said memoiy includes an address input which allows 
each set of messages to be addressed as a unit thereby enabling a set of messages to be mitten 
mto said memoiy in a single SIMD write operation. 



1 10. The apparatus of claim 1, wherein each of said plumlity of node processing units 



includes a control signal input for receiving a control signal to switch node processing. _ 
operationbetweenaconstndntnodemodeofpiocessingoperationandavariablen^^^ 



itmit 



4 processing operation. 



1 11. The apparatus of claim 10, further comprising: 

a decoder control device coupled to said plumlity of node processing units, the decoder 
control device generating said control signal used to control said plux^ty of node processing 
4 units. ^ 



1 
2 
3 



3 



12. The-aPParatusofclaimlLwhereineachoftheZprocessingmutsperformsavariable 

node low densityparitychedcmessage processing operation to generate at least one new 
message from at least one message received from said switching device. 



1 13. The apparatus of claim 10, 
2 



wherein at least one of the plurality of node processmg muts includes information 
indicating a number of messages to be used in each of a plurality of sequential variable node 



4 processing operations. 
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1 1 4. The apparatus of claim 7, 

2 wherein the decoder control device is further coupled to said message passing 

3 control device; and 

4 wherein the message passing control device specifies a different order in which 

5 each of the L sets of Z messages are to be read out of ttie memory during the variable node mode 

6 of processing operation than during constraint node mode of processing operation. 

1 15. The apparatus of claim 2, furth^ comprising a decoder control module coupled to the 

2 message ordering module, the decoder control module including means for supplying 

3 information to the message ordering module used to control the order in which each of the L sets 

4 of Z messages are to be read out of said memory. 

1 16. The apparatus of claim 15, wherein the decoder control device further includes means for 

2 supplying an edge index to the message ordering module which controls the generation of the set 

3 identifiers supplied to said memory, 

1 1 7. The apparatus of claim 1 6, further comprising a degree memory coupled to the node , 

2 processor for storing a set of node degree infomiation. 

1 1 8. The apparatus of claim 17, wherein the control device further generates a node index 

2 used to determine which node degree information in the stored set of node degree infomiation is 

3 to be supplied to the node processor at any given time. 

1 19. The apparatus of claim 1, further comprising: 

2 a second node processor coupled to said memory, the second node processor 

3 including a second plurality of node processing units, each of the second plurality of node 

4 processing units for perfomiing at least one of a constraint node processing operation and a 

5 variable node processing operation. 

1 20. The apparatus ofclaim 19, further comprising: 
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additional memory coupling said node processor to said second node processor, the 
additional memory including an additional set of memory locations for storing L sets of Z K-bit 
4 messages. 



1 21 . The apparatus of claim 20, further conq)rising: 

2 a second switching device coupling said node processor to said additional memory, the second 

3 switching device for passing sets of Z K-bit messages between said node processor and said 

4 additional memory and for reordering the messages in at least one of the sets of messages passed 

5 by the second switch. 



1 22. The apparatus of claim 21, 

2 wherein said node processor is a variable node processor for perfonning 

3 node decoder parity check processing operations; 

wherein said additional node processor is a constraint node processor for 
performing constraint node parity check decoder processing ope^^^ 

23. The apparatus of claim 21, further comprising: 

a parity check verifier, coupled to said additional node processor, for determining 
firom an output of each of the second pluraKty of processing units included therein, when a parity 
check decoding operation has been successfully completed. 



4 

5 

1 

2 
3 
4 



1 24. An apparatus for performing message passing decoding operations, the apparatus 

2 comprising: 

3 first memory including a first set of memoiy locations for storing L sets of Z K- 

4 bit messages, where L and Z are positive integers greater than one and K is a non-zero positive 

5 integer 



a first node processor including a first plurality of node processing units, each 

7 node processing unit for receiving at least one K-bit message in each set of Z K-bit messages 

8 supplied to the first node processor; and 
a first switching device coupling the firstmemory to the first node processor, the 

first switching device for passing sets of messages between the first node processor and the first 



9 
10 
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1 1 memory and for reordering the messages in at least some of the sets of messages being passed 

12 by said first switch. 

1 25 . The apparatus of claim 24, fiirther comprising: 

2 a second memory coupled to said first node processor including a second set of 

3 memory locations for storing L sets of Z K-bit messages; and 

4 a second node processor coupled to said second memory and to said first 

5 memory, the second node processor including a second plurality of node processing units, 

1 26. The apparatus of claim 25, further comprising: 

2 an additional switching device coupling flie additional memory to the second 

3 node processor, the additional switching device for receiving sets of Z K-bit messages jfrom said 

4 additional memory device and for supplying one message in each received set of Z messages to 

5 one of said second plurality of node processing units. 

1 27. The apparatus of claim 24, wherein the first node processor is a variable node processor,, 

2 the apparatus further comprising: 

3 means, coupled to said plurality of processing units included in said first node 

4 processor, for determining from an output of each of said first plurality of node processing units 

5 when a decoding operation has been successfiilly completed. 

1 28. A method of performing message passing decoding processing comprising the steps of: 

2 storing L sets of k-bit messages in a memory, each set of K-bit messages including first 

3 though Z messages, where L and Z are positive integers greater than one and K is a non-ZCTO 

4 positive integer; 

5 reading one of said sets of K-bit messages from memory; 

6 performing a message reordering operation on said read set of K-bit messages to produce 

7 a reordered set of Z K-bit messages; 

8 supplying, in parallel, the Z messages in the reord^ed set of messages to a vector 

9 processor; and 

10 operating the vector processor to perform message passing decoder operations using the 

1 1 Z supplied messages as input. 



NSDCXJID: <WO_0210363IA1.I_> 



wo 02/103631 „ 

PCT/US02/17396 

-46- 

1 29. The method of claim 26. wherein said message passing decoder operations generate a set 

2 OfZ decoder messages from the Z messages in the supplied reordered set of messages. 

1 30. The method of claim 29, wherein the step of operating the vector processor to generate Z 

2 decoder messages, includes the step of: 

3 performing, in parallel, Z node processing operations. 

1 31. The method of claim 30, wherein each of the Z node processing operations is one of a 

2 constraintnodepiocessingoperationandavariablenodeprocessing operation. 

1 32. The method ofclaim 28, further comprising: 

2 generating a message set identifier indicating the set of Z messages to be read out 

3 ofmemory. 

1 33. The method ofclaim 32. wherein Ihe step of reading one of said sets of K-bit messages 

2 includes: 

3 performing a SMD read operation using said message set identifier to identify the set of 

4 messages to be read from memory. 

1 34. The mefliod ofclaim 28, fiirther comprising: 

2 perfomiing a second message reordering operation, the second message 

3 reordering operation being perfonned on the generated set of Z decoder messages to produce a 

4 reordered set of generated decoder messages. 

1 35. The method ofclaim 34, further comprising: 

2 ^ori^g the reordered set ofgenerated decoder messages in said memory. 

1 36. The method ofclaim 35, wherein the step ofstorfng the reordered set of generated 

2 decoder messages includes performing a SIMD write operation to write said reordered set of 

3 generated decoder messages into memory. 
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1 37. The method of claim 34, wherein the step of perfonning a second message reordering 

2 operation includes performing the inverse of the message reordering operation performed on 

3 said set of K-bit messages read ftom the memory. 

1 38. The method of claim 28, further comprising: 

2 accessing stored message set permutation information; and 

3 wherein the step of performing a message reordering operation includes the step 

4 of: 

5 performing said reordering as a function of the accessed stored 

6 message set permutation information. 

1 39. The method of claim 37, wherein said message set permutation information includes 

2 cyclic rotation infomiation. 

1 40. The method of claim 28, 

2 wherein said message passing decoder operations are variable node processing 

3 operations, each variable node processing operation including generating a decision value, and 

4 whereui the method further comprises: 

5 examining decision values generated by operating the vector processor to determine if a 

6 decoding condition has been satisfied. 

1 41 . A method of performing message passing decoding processing, the method comprising 

2 the steps of: 

3 operating a node vector processor to generate a set of Z K-bit messages, where L and Z 

4 are positive integers greater than one and K is a non-zero positive integer; 

5 performing a message reordering operation on the generated set of Z K-bit messages to 

6 produce a reordered set of Z K-bit messages; 

7 performing a single write operation to store the reordered set of z K-bit mess^es in a 

8 memory device. 

1 42. The method of claim 41, wherein the step of performing a single write operation includes 

2 perfonning a SMD write operation to write the Z messages in &e reordered set of messages 

3 into memory in parallel. 



NSOOCtO: <WO_02l03e3lA1 J_> 



wo 02/10363J 

PCT/US02/17396 

-48- 

■ 43. ll=-»«ao<iof.l«im41.»h«mttestep„f„p.„.UngU>.„odavectorpxoc«.orU, 
2 Sa^«as«ofZK-bitmessagM.iiKlu(testhe8tepof: 

2 processing operations. 



45. The method of claim 43 wherHn th*- 7 ^.^^^ 

^ wnerein the Z node processing operations are constraint node 
processmg operations. 



> 46. ll«"'«fto<lofclain,43,wh=™p«f„„^,„«^^^ 

2 senerated set ofZK-bit messages tachdes: 

4 ""awlertlKmessagesinthesetofmessages. 

■ 47. An,emodofpcrf„™ingwj™^^j^„^^^ 

2 compnsing: 
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4 
5 
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7 
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perfonning a SIMD read operation to read a stored set of messages; 
perfom^g a message reordering operation on the stored set of messages- 
"PP^^^^^^-^^-^d-^ofnxese^gestonodepr^^^^ 
node processing units arranged in paraUel; and 

operating theplurality of node processing units to generate aset of updated 
messages as a function of the supplied reordered set of messages. 

48. The method of claim 47, further comprising the step of: 

^^^^-P^tedmessagesinsaidsetofupdatedmessagesmtoamemorydevic^ 
usmg a SIMD write operation. y^yice 



1 49. 
2 



The method of claun 48, further comprising fee step of: 

3 und^.. '''^^'^''""^^^^^g^P^^tionontheupdatedmessagesinsaidsetof 
updatedmessagespriortowritingflieupdatedmessagesintothememorydevice 
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